From: Jonathan Dieter Date: Fri, 20 Apr 2018 17:27:20 +0000 (+0300) Subject: Add zchunk format description X-Git-Tag: archive/raspbian/1.1.9+ds1-1+rpi1~1^2~306 X-Git-Url: https://dgit.raspbian.org/%22http://www.example.com/cgi/%22/%22http:/www.example.com/cgi/%22?a=commitdiff_plain;h=7ab12f113c75cef064dd1607cbabe96d96929cbd;p=zchunk.git Add zchunk format description Signed-off-by: Jonathan Dieter --- diff --git a/zchunk_format.txt b/zchunk_format.txt new file mode 100644 index 0000000..dd12864 --- /dev/null +++ b/zchunk_format.txt @@ -0,0 +1,183 @@ ++-+-+-+-+-+-+-+-+-+====================+=================+ +| ID | Flags | Checksum type (ci) | Header checksum | ++-+-+-+-+-+-+-+-+-+====================+=================+ + ++========================+=================+=======+ +| Compression type (ci ) | Index size (ci) | Index | ++========================+=================+=======+ + ++======================+============+ +| Signature size (ci) | Signatures | ++======================+============+ + ++=================+===========+===========+ +| Compressed Dict | Chunk | Chunk | ==> More chunks ++=================+===========+===========+ + +(ci) + Compressed (unsigned) integer - An variable length little endian + integer where the first seven bits of the number are stored in the + first byte, followed by the next seven bits in the next byte, and so + on. The top bit of all bytes except the final byte must be zero, and + the top bit of the final byte must be one, indicating the end of the + number. + +ID + '\0ZCK1', identifies file as zchunk version 1 file + +Flags + 32 bits for flags. All unused flags MUST be set to 0. If a decoder sees + a flag set that it doesn't recognize, it MUST exit with an error. Flags + + Current flags are: + bit 0: File has data streams + +Checksum type + This is an integer containing the type of checksum used to generate the + header checksum and the total data checksum, but *not* the chunk checksums. + + Current values: + 0 = SHA-1 + 1 = SHA-256 + +Header checksum + This is the checksum of everything from the beginning of the file + until the end of the index, ignoring the header checksum. + +Compression type + This is an integer containing the type of compression used to + compress dict and chunks. + + Current values: + 0 - Uncompressed + 2 - zstd + +Index size + This is an integer containing the size of the index. + +Index + This is the index, which is described in the next section. + +Signature size + This is an integer countaining the size of the signature section. + +Signatures + These are the signatures, described in a later section. + +Compressed Dict (optional) + This is a custom dictionary used when compressing each chunk. + Because each chunk is compressed completely separately from the + others, the custom dictionary gives us much better overall + compression. The custom dictionary is compressed without a custom + dictionary (for obvious reasons). + +Chunk + This is a chunk of data, compressed with the custom dictionary + provided above. + + +The index: + ++==========================+==================+===============+ +| Chunk checksum type (ci) | Chunk count (ci) | Data checksum | ++==========================+==================+===============+ + ++==================+===============+==================+ +| Dict stream (ci) | Dict checksum | Dict length (ci) | ++==================+===============+==================+ + ++===============================+ +| Uncompressed dict length (ci) | ++===============================+ + ++===================+================+===================+ +| Chunk stream (ci) | Chunk checksum | Chunk length (ci) | ++===================+================+===================+ + ++==========================+ +| Uncompressed length (ci) | ... ++==========================+ + +Chunk checksum type + This is an integer containing the type of checksum used to generate + the chunk checksums. + + Current values: + 0 = SHA-1 + 1 = SHA-256 + +Chunk count + This is a count of the number of chunks in the zchunk file. + +Checksum of all data + This is the checksum of everything after the index, including the + compressed dict and all the compressed chunks. This checksum is + generated using the overall checksum type, *not* the chunk checksum + type. + +Dict stream + If the data streams flag is set, this must always be 0, otherwise don't + include this integer + +Dict checksum + This is the checksum of the compressed dict, used to detect whether + two dicts are identical. If there is no dict, the checksum must be + all zeros. + +Dict length + This is an integer containing the length of the dict. If there is no + dict, this must be a zero. + +Uncompressed dict length + This is an integer containing the length of the dict after it has + been decompressed. If there is no dict, this must be a zero. + +Chunk stream + If the data streams flag is set, this indicates which stream this chunk + belongs to. 1 is the default, so decoders SHOULD decode stream 1 by default. + If the data streams flag isn't set, don't include this integer. + +Chunk checksum + This is the checksum of the compressed chunk, used to detect whether + any two chunks are identical. + +Chunk length + This is an integer containing the length of the chunk. + +Uncompressed dict length + This is an integer containing the length of the chunk after it has + been decompressed. + +The index is designed to be able to be extracted from the file on the +server and downloaded separately, to facilitate downloading only the +parts of the file that are needed, but must then be re-embedded when +assembling the file so the user only needs to keep one file. + +Streams can be used to separate file metadata and data. An example might be a +package format with the files stored in a tarball in stream 1, but the metadata +stored in stream 2. + + +The signatures: ++=====================+=====================+===========+ +| Signature type (ci) | Signature size (ci) | Signature | ... ++=====================+=====================+===========+ + +Signature type + This is an integer containing the type of signature. Currently there are + no recognized signature types. + +Signature size + This is an integer containing the size of the signature. + +Signature + The actual signature. The signature MUST only apply to the header, excluding + the header checksum, the signature count and the signatures. + +Signatures are designed so that anyone can add a new signature to a file +without changing the validity of other signatures, but the header checksum +must be recalculated. + +We only sign the header so the signature can be validated independently of the +data, though the data can then be validated through both the chunk checksums +and the full data checksum, both of which will be signed by the signatures.